Method of encoding video with film grain

ABSTRACT

A system for providing improved video quality and compression efficiency during encoding by detecting video segments having film grain approaching the “Red Lady” problem. The system detects when film grain approaches the level of the “Red Lady” problem by measuring frame-by-frame temporal differences (ME scores). From the ME scores, two key indicators are identified: (1) The average temporal difference in frames with an intermediate motion level higher than frames of non-noisy video; and (2) The fluctuation of the temporal differences between frames in a group is very small. When these indicators identify a high film video, a signal is provided to an encoder which allocates less bits to I frames and more bits to P and B frames than for other frames of video without comparable film grain.

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.14/962,814 filed Dec. 8, 2015, which claims priority under 35 U.S.C. §119(e) from earlier filed U.S. Provisional Application Ser. No.62/099,372 filed on Jan. 5, 2015 and incorporated herein by reference inits entirety.

BACKGROUND Technical Field

The present invention relates to a process for improving video qualitywhen encoding video with film grain. More particularly, the presentinvention relates to a solution to improving video quality when filmgrain is present on a level similar to the “Red Lady” noise problem.

Related Art

Film grain is hard to compress in an encoder. It requires more bits toencode than many other kinds of content for any level of video quality.Film grain may be thought of as a particular kind of spatial temporalnoise. As such, film grain has low temporal predictability from oneframe to any other frame of video. Thus, the encoding process is limitedin its ability to leverage inter-frame estimation to achieve significantcompression efficiency.

In some encoders, not enough bits are allocated to inter-predictedpictures. The result can be significant video quality artifacts such asI-frame beating and intermittent repetitive loss and recovery of spatialtexture. Even encoders that can allocate significant bits might not beable to eliminate substantial noise, such as when noise is as high as inthe “Red Lady” video frames.

A. The “Red Lady” Problem

A frame of the “Red Lady” video is illustrated with FIG. 1 . The “redlady” video shows that a lady walking alongside a soccer field with agrassy background. The scene itself is simple, but the entire video isfilled with film grain.

Film grain is like random noise. It requires a lot of bits to encode andis not temporally predictable, which makes high film grain video, inparticular the “Red Lady” video, very difficult to encode.

A common practice to encode video with film grain is to encode a goodquality I frame as a reference frame as a prediction frame forsubsequent predictive frames (P or B frames). However, since the I frameand P and B frames all contain film grain this does not predict well,and many bits are needed to encode unpredicted high frequencycomponents. If too many bits are allocated to the I frame, later P and Bframes may be allocated fewer bits than they need, and their qualitysuffers. The good quality I frame, thus, may not help with thesubsequent P and B frames.

In the Red Lady Video, the random noise level is very high. Thus, in theRed Lady Video, a beating effect will be seen, due to the quality of thepictures varying too much between frame types. Allocating more bits tothe I frame does not help with reducing film grain in subsequent P and Bframes in a typical encoder.

B. The “Dirty Window” Problem

As shown in FIG. 2 , the difference between two consecutive frames ismostly noise. Encoding a good quality I frame for these costs too manybits and leaves fewer bits for predictive frames. Moreover, the highquality I frame, even with additional bits allocated, is not a goodreference frame because the noisy temporal differences cannot be motionpredicted well. With the I frame as a reference, the film grains in thepredictive frames would be poorly encoded, and create a qualitydisparity between I and predictive frames, as illustrated in FIG. 3 .Thus allocating more bits to the I frame creates a “Dirty Window” forfuture film grain elimination in P and B frames.

Accordingly, it is desirable to provide better solutions for eliminatingfilm grain comparable to the “Red Lady” video, and to avoid creating“Dirty Window” I frames.

SUMMARY

Embodiments of the present invention provide a system that enablesimproved video quality and compression efficiency during encoding bydetecting video segments having film grain approaching the “Red Lady”problem and then optimizing the bit allocation between intra- andinter-predicted pictures using bit allocation variation between I, P andB type frames.

To optimize the bit allocation when a video clip is identified as a “RedLady” like clip, embodiments of the present invention encode smaller Iframes and allocate more bits on P and B frames. Since allocating morebits to the I frame when the “Red Lady” film grain problem occurs doesnot enable better prediction for encoding in the P and B frames,additional bits to the I frame are not necessary. Thus, allocating extrabits allocated to the P and B frames and not using the extra bits in theI frame enables reduction of frame grain when the “Red Lady” like filmgrain problem occurs, and the “Dirty Window” I frame issue will nolonger be a consideration.

To identify the film grain level to determine when the optimization ofbit allocation away from I frames to P and B frames should occur, atemporal analysis of motion-prediction data available is provided. Forthe temporal analysis, measurements of plotted frame-by-frame temporaldifferences (ME scores) of the received videos are determined. From theME scores, two key indicators are identified: (1) The average temporaldifference in frames with an intermediate motion level (i.e., greaterthan ME score of 20), is higher than frames of non-noisy video withintermediate motion; and (2) The fluctuation of the temporal differencesbetween frames in a group is very small, unlike the non-noise video withnatural motions which have higher motion differences without the noise.These two indicators are set to identify when a special bit allocationratio between the I, P and B frames should be applied so that there willtend to be less difference between frame types for film-grain content.

The system according to embodiments of the present invention uses apreprocessing filter that analyzes video frames prior to the encoder.The preprocessing filter computes the temporal difference score andstores it in a queue of data provided with the frames to the encoder.The encoder analyzes the temporal difference scores. If it is detectedthat the average of all temporal differences is higher than a thresholdand the variance of them are smaller than a threshold, it means thevideo contains significant film grain or noise. Based on the level offilm grain or noise detected, the encoder allocates bits to I, P and Bframes dynamically.

BRIEF DESCRIPTION OF THE DRAWINGS

Further details of the present invention are explained with the help ofthe attached drawings in which:

FIG. 1 shows a frame from the “Red Lady” video;

FIG. 2 illustrates that for the “Red Lady” video frames, the differencebetween two consecutive frames is mostly noise;

FIG. 3 illustrates that with an I frame as a reference, the film grainsin the predictive frames would be poorly encoded, and create a qualitydisparity between I and predictive frames;

FIGS. 4A-4F show the measured and plotted frame-by-frame temporaldifferences (ME scores) of various video clips;

FIG. 5 is a diagram of components for implementing embodiments of thepresent invention in an encoding system; and

FIG. 6 is a flow chart illustrating steps for implementing encoding withfilm grain according to embodiments of the present invention.

DETAILED DESCRIPTION

For embodiments of the present invention, if a clip can be identified asa “Red Lady” like clip, good quality can be achieved by encoding smallerI frames and allocating more bits on P and B frames.

To help understand how a determination of when a clip is a “Red Lady”type clip, several different clips of data are analyzed. FIGS. 4A-4Fshow the measured and plotted frame-by-frame temporal differences (MEscores) of various video clips. In FIGS. 4A-4F, the X axis is a frameindex and the Y axis shows a scaled ME score ranging from 0 to 100.

From the measurements in FIGS. 4A-4F, two key indicators of noisy videoare found: (1) The average temporal difference is at an intermediatelevel (greater than ME score of 20), and the ME score is higher than theone of non-noisy video with intermediate motion; and (2) The fluctuationof the temporal differences is very small, not like the non-noise videowith natural motions. These two indicators are, thus, used to identify“Red Lady” like video frames from any streaming video.

FIGS. 4A-4C illustrate the ME score levels for video with noise andlimited or no motion. In FIG. 4C, the Red Lady video has a constantnoise level ME score of just over 20. The “Sweep” video noise levels ofFIG. 4B are very high with an ME noise level of near 100. The “ZonePlate” noise with no motion and a set noise level has an ME score ofjust over 20 and can be used as a reference for ME levels. Note that inthe “Blacksmith” frame of FIG. 4D, a normal video that does not needspecial treatment during encoding using embodiments of the presentinvention that the temporal average noise difference is far below an MElevel of 20 found with the “Red Lady” video of FIG. 4C.

Unlike the “Blacksmith” of FIG. 4D, the “Sprinkler Lady” of FIG. 4Emeets both the two key factors of (1) an ME score over 20 and (2) thefluctuation of the temporal differences is small even taking intoaccount motion. The final video of “Basketball” in FIG. 4F has arelatively high ME score, but it is less than 20, and the motion in thevideo accounts for ME levels that on average may push the total ME scoreover 20. However with the criteria of (1) an average temporal differencewith a ME score greater than 20 and (2) the fluctuation of the temporaldifferences very small, the video of FIG. 4F does not requireembodiments of the present invention to be used during encoding.

FIG. 5 is a diagram of one embodiment of components for implementingembodiments of the present invention in an encoding system. In FIG. 5 ,the preprocessing filter 500 computes the temporal difference score andstores it in the queue 502. The encoding in encoder 506 will be delayedby the frame buffer 504 until temporal difference scores of N frames arecollected in the queue 502. The encoder 506 analyzes the temporaldifference scores of N frames. If it is detected that the average of alltemporal differences is higher than a threshold and the variance of themare smaller than a threshold, it means the video contains significantfilm grain or noise. Based on the level of film grain or noise detected,the encoder 506 allocates bits to I, P and B dynamically according toembodiments of the present invention described herein. Generally, if thelevel of film grain or noise is high, the encoder allocates more bits toP and B frames than for other content.

FIG. 6 is a flow chart illustrating steps for implementing encoding whenhigh film grain is detected according to embodiments of the presentinvention. First, in step 600 the video clips are received, such as the“Red Lady” clip illustrated in FIG. 4C. Next, in step 601, the temporaldifference score, or ME score, is determined for the video frames andthe result for each frame stored in a queue to provide to the encoder.Next, in step 602 a determination is made to decide if film grain noiseis high enough to constitute “Red Lady” type film grain that requiresapplication of embodiments of the present invention. For the step 602determination, if the average of a group of temporal differences ishigher than a threshold and the variance is smaller than a threshold,the film grain noise is indicated to be significant for the frames ofthe video clip.

Once the determination is made in 602, next in step 603 thedetermination is reviewed. If film grain noise for the clip isdetermined to be significant, then the program proceeds to step 604. Ifthe film grain noise is determined to be insignificant, the programproceeds to step 605. In step 604 when high film grain noise is detectedencoding is performed by allocating enough bits so that the I frame atthe beginning has little additional bits and the P and B frames haveadditional bits for encoding. In step 605 when film grain noise is notdetected as high, a normal bit allocation is performed by the encoder.

The results of applying the algorithm shown in FIG. 6 will reduce the“dirty window” effect. The algorithm also makes some high texture clips,such as “sprinkler” of FIG. 4E, look sharper. The algorithm does notchange the quality of other non-noisy clips that do not rise to thedetected level of the “Red Lady” video.

For reference, Appendix A below shows an example of coding in “C” toimplement the algorithm illustrated by FIG. 6 .

For components shown, like the pre-processing filter 500 and the encoder506, each component according to embodiments of the present inventioncan include a processor and memory to enable operation. The memory ofeach device stores code that is executable by the processor to enablethe processor to perform the processes described herein. Further thememory can be used to provide data storage with the data accessible bythe processor to store or retrieve when performing operations.

Although the present invention has been described above withparticularity, this was merely to teach one of ordinary skill in the arthow to make and use the invention. Many additional modifications willfall within the scope of the invention as that scope is defined by thefollowing claims.

APPENDIX A // // code for Red Lady film grain detection #defineRC_DYNAMIC_QMUL_ME_SCORE_MEAN_THRESHOLD_MIN 20 #defineRC_DYNAMIC_QMUL_ME_SCORE_MEAN_THRESHOLD_MAX 75 #defineRC_DYNAMIC_QMUL_ME_SCORE_VARIANCE_THRESHOLD 2 uint8_tget_qmul_b_increase_from_lookahead ( hlenc_fbnode_t* fbnode, uint8_tfirst_pass_enc_id) {  img_par_t *imgpar = get_imgpar_by_fbnode(fbnode); uint32_t vfid = GET_DB_INDEX( fbnode->ext_vcap_pi );  uint32_toffset_tf = 0, offset_bf = 0;  scene_change_info_t scene_change_info; uint8_t scene_id_first = 0;  int8_tme_score[RC_2PASS_LOOKAHEAD_DISTANCE_DYNAMIC_QMUL] = {0};  uint32_tme_score_cnt = 0;  uint32_t i = 0, sum = 0, var = 0;  int8_t mean = 0; uint8_t b_increase = RC_DEFAULT_B_INCREASE_FOR_QMUL;  //get offset ofthe me_score for top and bottom field  offset_tf =(uint32_t)(&(((ext_vcap_package_t *)0)->sc_info_frm_top));  offset_bf =(uint32_t)(&(((ext_vcap_package_t *)0)->sc_info_bot));  //get the firstscene_info  get_epi_data(MAKE_DB_ID(first_pass_enc_id, (vfid & 0xFF)),(uint32_t)&scene_change_info, offset_tf, sizeof(scene_change_info_t)); scene_id_first = scene_change_info.scene_id;  //the me_score for thefirst frame is always big because it is calculated between the new sceneand old scene  // we only want to use the me_score of the new scene  do {   vfid++;   get_epi_data(MAKE_DB_ID(first_pass_enc_id, (vfid &0xFF)), (uint32_t)&scene_change_info, offset_tf,sizeof(scene_change_info_t));   //todo: check scene change instead ofscene id?   if ( scene_id_first != scene_change_info.scene_id )   //if (scene_change_info.sc_here )   {    orc_printf(″scene_id_first %dscene_change_info.scene_id %d me_score_cnt%d″,scene_id_first,scene_change_info.scene_id,me_score_cnt);    break;  }   me_score[me_score_cnt++] = ( scene_change_info.me_score>>3 );   if( imgpar->pic_is_field )   {    //get me score for bottom field   get_epi_data(MAKE_DB_ID(first_pass_enc_id, (vfid & 0xFF)),(uint32_t)&scene_change_info, offset bf, sizeof(scene_change_info_t));   me_score[me_score_cnt++] = ( scene_change_info.me_score >>3 );   }  }while (me_score_cnt < RC_2PASS_LOOKAHEAD_DISTANCE_DYNAMIC_QMUL );  //getthe mean and variance  for (i = 0; i < me_score_cnt; i ++)  {   sum +=me_score[i];   //orc_printf(″me_score[%d] %d″,i,me_score[i]);  }  mean =sum/me_score_cnt;  for (i = 0; i < me_score_cnt; i ++)  {   var += (mescore[i] - mean)*(me_score[i] - mean);  }  var /= me_score_cnt;  //thisis to check redlady like noisey video.  // When the average me_score ishigher than a threshold but me_scores have a very small fluctuation, itmeans the temporal prediction error  // was primarily casued by a lowlevel of noise not the actual natural motion. In this case, we want tospend more bits on P and B frames so we encode less skip blocks.  if(mean > RC_DYNAMIC_QMUL_ME_SCORE_MEAN_THRESHOLD_MIN && mean <RC_DYNAMIC_QMUL_ME_SCORE_MEAN_THRESHOLD_MAX &&   var <=RC_DYNAMIC_QMUL_ME_SCORE_VARIANCE_THRESHOLD)  {   b_increase = 25;  } //orc_printf(″ me_score_mean %d var %d b_increase %d″,mean, var,b_increase);  return b_increase; }

What is claimed:
 1. A method for encoding video comprising: receivingvideo frames from an encoder configured to compresses the video frames;receiving motion estimation (ME) data for the video frames; calculatinga ME score for the video frames, wherein the ME score indicates ameasurement of plotted frame-by-frame temporal differences; calculatingan ME stability value measuring a statistical measure of the ME scoreover time; providing a high film grain indication signal to the encoderindicating a particular group of the video frames contains high filmgrain when the ME score for the segment exceeds a first threshold andthe ME stability value for the segment is below a second threshold,wherein when the high film grain indication signal is provided, theencoder compresses the video frames by allocating less bits to I framesand more bits to P and B type frames than to other frames of the video.2. The method of claim 1, wherein when the high film grain indication isnot provided, the encoder allocates more bits to the I frame than whenthe high film grain indication is provided.
 3. The method of claim 1,wherein the first threshold is greater than 20.